Conditional information and definition of neighbor in 
categorical random fields" 



Reza Hosseini, University of British Columbia, 
333-6356 Agricultural Road, Vancouver, BC, Canada, V6T1Z2 
rezal317@gmail.com 

Abstract 

We show that the definition of neighbor in Markov random fields as defined 
by Besag (1974) when the joint distribution of the sites is not positive is 
not well-defined. In a random field with finite number of sites we study the 
conditions under which giving the value at extra sites will change the belief 
of an agent about one site. Also the conditions under which the information 
from some sites is equivalent to giving the value at all other sites is studied. 
These concepts provide an alternative to the concept of neighbor for general 
case where the positivity condition of the joint does not hold. 

Keywords: Markov random fields; Neighbor; Conditional probability; Infor- 
mation 

1 Introduction 

This paper studies the conditional probabilities and the definition of neighbor in 
categorical random fields. These can be used to describe spatial processes e.g. in 
plant ecology. We start by the common definition of neighbor in Markov random 
fields and show that the definition is not well-defined when the joint distribution is 
not positive. Then we provide a framework to study the conditional probabilities 
given various amount of "information" . For example, the conditional probability of 
one site given some others. Since the usual definition of neighbor is not well-defined 
when the "positivity" condition of the joint distribution does not hold, we introduce 
some new concepts of "uninformative set" , "sufficient information set" and "minimal 
information set". 

Suppose we have a finite random field consisting of n sites. The belief of 
an agent about one site can be summarized by a probability distribution and can 
be changed to a conditional distribution by relieving new information which can 
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be the value at some other sites. We study when the new information changes the 
agent's behef and what is "sufficient" information for the agent in the sense that 
giving the information would be equivalent to giving the value of all other sites. 
We answer some interesting questions along the way. For example suppose agent 1 
has less information than agent 2 regrading an event A and a new information is 
released. Now, suppose that agent 1 does not change his behef about A. One might 
conjecture that since agent 2 has more information, he as well will not change his 
belief after receiving the new information. We show this conjecture is wrong by 
counterexamples. 



2 Neighbor in categorical random fields 

Suppose S,P) is a probability space and {Xj}"^]^ is a stochastic process. Each 
Xi takes values in Mi, |Mj| = < oo, and P{xi) > 0, Vxj G Mj. We use the 
shorthand notation: 



P(^Xi\Xi-^ ' ' ' , ) P(yX.i Xi\X.l-^ 



X 



Besag ( I974I ) and Cressie and Subash ( 1992 ). defined the neighbor as follows: 



Definition 2.1 For site i, i = 1, ■ ■ ■ ,n, site j ^ i is called a neighbor if and only 
if the functional form of the P{xi\xi, ■ ■ ■ , Xj+i, ■ ■ ■ , x„) is dependent on Xj. 

Note that in the above definition, we need to make sure that the conditional 
probability is defined. The above conditional probability is defined on 



E, = {{x 



I5 



P{x 



1) 



>0}. 



We show in the following example this definition is not well-defined in general 
since the functional form is not unique. 

Example 2.1 Let f/i, ■ ■ ■ ,1/4 denote a random sample from the uniform distribution 
that take only values and 1 each with probability 1/2. Define: 



X3 = [X2] + t/4. 
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where [ ] denotes the integer part of a real number. By the last equality in above, 
X3 if we know the value of X2, the value of Xi will not give us extra information. 
Hence, 

P{x^\x2-,Xi) = P{X3\X2). 

But since [X2] = [Xi], we also have 

P{x3\x2,xi) = P(a;3|a;i), 

wherever the conditional probability is defined. This shows the definition of neighbor 
is not well-defined in general. 

Next we show that the positivity of the joint distribution imphes that the 
definition of neighbor is well-defined. By positivity of the joint distribution, we 
mean 

Vx = (xi, • • • , x„) e n^Li^i, P{Xi = xi, • • • , x„ = x„) > 0. 

Lemma 2.1 Suppose Xi, - ■ ■ , X^ be a categorical random field. If the joint distri- 
bution is strictly positive then the concept of neighbor is well-defined for this field. 

Proof Suppose — {ji, ■ ■ • ,jj} and H — {hi, ■ ■ ■ , hn} are sets of neighbors of 
site i. Hence, 

• • ■ , Xi—i, Xi-^i, ■ ■ ■ , Xn) — fi-^ji T ' ' ' 1 -^jj^ 

also, 

P{Xi\xi, • ■ ■ , X-i^i, 2-1+1) ■ ■ ■ ) ^n) — 9{-^hn ' ' ' i •^Hh) 

For some functions /, g. By positivity condition, the conditional probability is de- 
fined everywhere. Hence, 



f{xj^, ■■■ , Xjj) = g{xhi, ■■■ , Xhii), Vx = (xi, • • • , x„) e H^Li^j. 

Suppose h E H — J. Then Xh does not appear on the left hand side so g is not 
dependent on Xh- We conclude H — J — $. Similarly, J — H — $. ■ 
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3 Uninformative information sets 



In the following, we consider the general case (when the positivity condition does 
not hold) and define some useful concepts which are w ell-defined ev en though the 



concept of neighbor is not as well-defined as defined by iBesagI (119741 ) . 

We start by some useful definitions and lemmas regarding conditional prob- 
abilities. Consider the conditional probability P{A\B) where A,B are two events 
and P{B) > 0. Also consider a third event C. It is interesting to study when C 
changes (or does not change) our beliefs about probability of A. Formally, we have 
the following definition. 

Definition 3.1 We call C uninformative for A given B if 

P{A\B,C) = P{A\B) or P{B,C) = 0. 

Let UN{A\B) to be the set of all events C such that P{B,C) = or P{A\B,C) = 
P{A\B). 

Lemma 3.1 UN{A\B) is closed under countable disjoint union. 

Proof Suppose, {QjZi and Q nCj = ^, i^ j. If for all d, P{B n C^) = then 
result is trivial. Otherwise, Let / = {i\ P{B n C^) 0, z = 1, 2, ■ ■ ■ }. 



p(A|i?,u-iC,)-^^^'^'^*=^^*- 



p{B,ur=,Q) 

Y:.^,P{AB,Q) _ Z.^,PiA\B,Q)PiB,a 



P{A\B). 



j:.^^,P{A\B)PiB,C- 



One might also conjecture that UN{A\B) is closed under intersection. We 
show by some counterexamples, this is not true. 
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Example 3.1 n = {1, 2, 3, 4, 5, 6, 7, 8}, A = {1, 2, 3, 4}, 5 = fi, Ci = {2, 4, 6, 8}, C2 = 

{1, 3, 5, 8} and consider a uniform probability distribution on Q. 

Then P(A\B) = P(A) = 1/2, P{A\B,Ci) = P{A\B,C2) = 1/2 hence 
Ci,C2 e UN{A\B). But P{A\B,Ci,C2) = wMle P{B,Ci,C2) = 1/87^0. 

Example 3.2 Consider the joint distribution for {X, Y, Z) given in Table 1, where 
every row has the same probability of 1/4- Suppose that two agents want to predict 
the value of X. The first person does not have any information and the second one 

knows that Z = 0. Now, assume that we provide extra information to both agents. 
The extra information is the value ofY. For the first agent at the beginning (before 
the inform,ation about Y was given): P[X = 0) = P{X = 1) = 1/2. After he 
knows the value of Y : P{X = 1\Y = 0) = P{X = 1\Y = 1) = 1/2. Hence, the 
eoctra information does not change the belief of the first agent about X. One might 
conjecture that since the second agent has more information than the first and the 
new information did not help the first agent update his belief, it should not change 
the belief of the second agent as well. This is not true! In fact after getting the extra 
information, we have the following inequality for the second agent: 

= P{X = 1|Z = 0, F = 1) 7^ P{X = 1|Z = 0, F = 0) = 1/2. 



X 


Y 
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1 


1 


1 


1 
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Table 1: The joint distribution of X, Y, Z 

We to prove a seemingly trivial fact about the conditional probabilities in the 
following lemma. 

Lemma 3.2 Suppose P{A\B) is defined. Also suppose {C'i}^=j^; k — 1, 2, ■ ■ ■ , 00 a 
(finite or countable) collection of disjoint sets such that uf^^^Q = Q. Assume 

P{B,Ci) = orP{A\B,Ci) = c. 

In other words, P{B,Ci) does not depend on Cj. Then Ci G UN{A\B): 

P{A\B, Ci) = P{A\B) or P{B, d) - 0. 
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Proof Let / = {z| 1 < i < k, P{B,Ci) > 0}. Then we have 

Pirn ^ ^l^J^^^^^^^) . 
Ell ma) 

Y..^,P{A\B,C,)P{B,C,) _ 



c. 



Corollary 3.1 Suppose P(xi|a;j^, ■ ■ ■ ,Xij) depends only on Xj^, ■ ■ ■ ,Xjj, where 

{ji,--- Jj} C {iir-- ,^/}, 
when the conditional probability, P(xi|xj^, • ■ • ,Xij) is defined. Then 

P{Xi I Xjj , ■ ■ ■ , Xij ) P(yXi I Xj-^ , ■ ■ ■ , Xjj ) , 

when the conditional probability , P{xi\xi^, ■ ■ ■ ,Xij) is defined. 

Proof Fix {x'j^, ■■ ■ ,x'jj). Let A = {Xj = Xi} and B = {Xj^ = x'^^,-- ■ , Xjj = x'^^}. 
Let 

{/ci,- ■ ■ = {ii,-- ■ ,ii}- {ji,- ■ ■ 

Consider the sets 

Cx'fc^,-,xfe^ = {-^fci = Xk^,- ■ ■ ,Xk^ = Xfc^}, Xki G Mfcj. 

These sets are disjoint, there exist finitely many of them and their union is ^2. Then 
by the assumption P{A\B,C^^^^...^^^^) = c, or P{B,C^^^^... ^^^^) = 0. Now apply 
Lemma [3l2l to A. B. C^.. ... ^, . ■ 
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4 Sufficient and minimal information sets 



This section introduces minimal and sufficient information sets. Suppose we have 
n sites in the random field indexed by 1,2, ••• ,n. We denote a site by i. Let 
i'^ — {1,2, ■ ■ ■ ,n} — {i} be the set of all other sites other than site i. Let X — 
{^1, • ■ ■ ) ^7} C {1, 2, • • • , n} be a collection of sites and let 

Note that D depends on the set of the subscripts and not the order of them. Also 
note that D is the domain where the conditional probability given the values on 
the sites X is defined. By p{i\X), we mean the conditional probability of site i given 
X defined on Ei-x = Mi x Dx- Also note that with the positivity of the joints 
distributions assumption: 

Since the concept of neighbor is not well-defined in the general case, we seek other 
useful definitions to study the general case. 
Note that -P(i|X) is a function 

P{i\X) : Mi X Dx^[Q, 1], 

Definition 4.1 Sufficient information set: Suppose ^ C X C {1,2, • • • ,n\, J is 
called a sufficient information set for i, given X, if 

P{i\X)^P{i\J), 

on Ei-x- We denote the set of all such sets by SI{i,X). 

Definition 4.2 X C 1, 2, • • • ,n is called a minimal information set for i if P{i\X) ^ 
P{i\J^) for any J ^ J <ZX^J ^X. We denote the set of all such sets by MI{i). 

In the following, we study the properties of SI (sufficient information) and 
MI (minimal information) sets. 

First, let us see what happens if i G X. In this case, {i} G SI{i,X). Also, 
note that in general {i} G MI{i) if rrii > 1. (If rrii = 1 then we need no information 
to say what the value of site i is.) Also note that G MI{i) in general. 

One might conjecture a smaller a set than a given minimal information set is 
a minimal set as well. This is not true! In example 3, {Y, Z} G MI{X) but {Y} is 
not minimal since P{X\Y) ^ P(X|0). 
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Proposition 4.1 Suppose J e SI{i,X) and % — X — J . Also assume 
then 

P{i\j) = P{i\j,xh, eNh,,--- ,Xh„ ^Nh^), 
whenever, the right hand side is defined. 

Proof Fix {x'j^, ■ ■ ■ ,x'jj), we want to show 

whenever the left hand side is defined. But 
or 

since J' is sufficient. Now use the fact that UN is closed under disjoint union and 
take the union over 

{Xj^ = x'j^, - ■ ■ , Xjj = Xjj,Xh^ = Xh^, - ■■ , Xh^ = Xhu}xh^&NH^,- ,Xhij&Nhu 



Lemma 4.1 a) If J e SI{i,X) andJcUcX then J e SI{i,H). 
bjifje SI{i,X) and J en gX then H e SI{i,X). 



Proof 

Let JC — X — H. JC — {ki, • • • , kx}- We want to show that for a fixed 



a) P{xi\Xf^_^, ■ • ■ , Xf^^) — P{xi\xj^, ■ ■ ■ ,x 



J- ) 



b) P(xi|Xjj, ■ ■ ■ , Xj^) — P{xi\Xf^^, • ■ ■ , 
By assumption for all (xjj, ■ ■ ■ ,Xij) which their restriction to indices in K is 

P(Xi\Xi^ , • • • , Xij ) — P{xi\x , • • • , Xjj^. 



On the left hand side take the union over {Xk^ 
get 



We 



, Xj- 



P(xAx', 



P(x, 



X. 



«1? 



X, 



To generahze the concept of neighbor, we can use the sufficient information 
and minimal information sets. We call a set efficiently sufficient for site i if it is 
minimal and sufficient for i given i'^. i.e. X is efficiently sufficient for i if and only 
if X G MI{i) n SI{i,i'^). We denote the set of all such sets ES{i). If for some i, 
ES{i) has only one element, we call that element a neighbor of site i. Note that 



the definition of neighbo r coinc ide with the definition of neighbor by iBesagI (119741 ) 
and ICressie and SubashI (119921 ) if the positivity condition holds. In the following 
example we show that this is not necessary. 

Example 4.1 Consider the joint distribution of X,Y as given by Table 2, where 
every row is equally probable. Then the positivity condition does not hold since 
P{X = l,y = 0) = 0. But for X, the site Y is a neighbor since Y G MI{X) f] 
SI{X, Y). Also for Y, X is a neighbor. 



X 
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1 





1 









Table 2: The joint distribution of X, Y 
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